feat: add EAGLE3 support for Step-3.5-Flash by zijiexia · Pull Request #530 · sgl-project/SpecForge

zijiexia · 2026-04-13T16:28:59Z

Summary

New chat template (specforge/data/template.py): registers step3.5, a thinking-enabled template using <|im_start|> / <|im_end|> tokens, matching Step-3.5-Flash's format.
New draft model config (configs/step-3.5-flash-eagle3.json): EAGLE3 architecture config for Step-3.5-Flash — 1-layer LlamaForCausalLMEagle3 with aux hidden states captured from layers 4, 20, 40.
Training script (examples/run_step3p5_flash_eagle3_online.sh): end-to-end online training script for EAGLE3 on Step-3.5-Flash with SGLang backend, FA3 attention, and W&B logging.
smoltalk-chinese dataset (scripts/prepare_data.py): adds process_smoltalk_row and wires up zjxia/smoltalk-chinese as a supported dataset option.
Fix sglang_max_total_tokens OOM for SWA models (specforge/args.py): changed target_batch_size * max_length to int(target_batch_size * max_length * 1.2). The 1.2× buffer is driven by three structural properties of SGLang's SWA memory allocator:
1. Page-alignment overhead: alloc_paged_token_slots_extend over-reserves by batch_size × page_size slots on every extend call (mem_cache/common.py:267).
2. Dual-pool double-counting: SWA models maintain two independent pools (full_attn and swa_attn), each independently applying the same overhead check (swa_memory_pool.py:370–414).
3. SWA pool shrinkage: the SWA pool is sized at swa_full_tokens_ratio = 0.8× of the full pool, so it exhausts first. Compensating for shrinkage alone requires 1/0.8 = 1.25× — but factors 1 and 2 add further overhead on top, pushing the true requirement slightly above 1.25×. In practice, because page-alignment overhead is small (~0.78% per extend batch at batch=128, page_size=16, max_length=2048), 1.2× is empirically sufficient and avoids unnecessary over-reservation of the token pool.

Test plan

Run run_step3p5_flash_eagle3_online.sh and confirm training loop starts without OOM
Verify smoltalk-chinese dataset processes correctly via prepare_data.py --dataset smoltalk-chinese
Confirm step3.5 template tokenizes a sample conversation as expected

🤖 Generated with Claude Code

gemini-code-assist · 2026-04-13T16:29:04Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zijiexia · 2026-04-13T16:52:02Z

The support also requested changes on the sglang side, PR raised: sgl-project/sglang#22718

zijiexia added 10 commits March 17, 2026 22:11

feat: add EAGLE3 support and new template for step3.5

e4b363f

add training script

581cdfd

reproduce 03

f73907b

use regen train the draft model

352a647

feat: add support for smoltalk-chinese dataset processing

2d28645

Merge branch 'main' into train_step3p5

97af602

update smoltalk-chinese

8f7f391

fix: correct train data path in run_step3p5_flash_eagle3_online.sh

119ec6a

lint

de9b1d8

remove unused file

c271055

zijiexia requested review from FlamingoPg, FrankLeeeee, shuaills, sleepcoo and zyksir as code owners April 13, 2026 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add EAGLE3 support for Step-3.5-Flash#530

feat: add EAGLE3 support for Step-3.5-Flash#530
zijiexia wants to merge 10 commits intosgl-project:mainfrom
zijiexia:support_step3p5

zijiexia commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

zijiexia commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zijiexia commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

zijiexia commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zijiexia commented Apr 13, 2026 •

edited

Loading